{24 () Parallel Formulations of Decision-tree Classiication Algorithms
نویسندگان
چکیده
Classiication decision tree algorithms are used extensively for data mining in many domains such as retail target marketing, fraud detection, etc. Highly parallel algorithms for constructing classiication decision trees are desirable for dealing with large data sets in reasonable amount of time. Algorithms for building classiication decision trees have a natural concurrency, but are diicult to parallelize due to the inherent dynamic nature of the computation. In this paper, we present parallel formulations of classiication decision tree learning algorithm based on induction. We describe two basic parallel formulations. One is based on Synchronous Tree Construction Approach and the other is based on Partitioned Tree Construction Approach. We discuss the advantages and disadvantages of using these methods and propose a hybrid method that employs the good features of these methods. We also provide the analysis of the cost of computation and communication of the proposed hybrid method. Moreover, experimental results on an IBM SP-2 demonstrate excellent speedups and scalability.
منابع مشابه
Eecient Parallel Classiication Using Dimensional Aggregates
Multidimensional aggregates are frequently computed to improve query performance in Online Analytical Processing applications. We present a new method for decision tree based classiication trees using the aggregates computed in the multidimensional data model. The structure imposed on data in a explicit multidimensional storage mechanism leads to eecient dimensional operations. Decision tree ba...
متن کاملrid Age Car Type Risk 0 23 family High 1 17 sports High
Classiication is an important data mining problem. Although classiication is a well-studied problem, most of the current classi-cation algorithms require that all or a portion of the the entire dataset remain permanently in memory. This limits their suitability for mining over large databases. We present a new decision-tree-based classiication algorithm , called SPRINT that removes all of the m...
متن کاملDynamic Load Balancing of Unstructured Computations in Decision Tree Classifiers
One of the important problems in data mining is discovering classification models from datasets. Application domains include retail target marketing, fraud detection, and design of telecommunication service plans. Highly parallel algorithms for constructing classification decision trees are desirable for dealing with large data sets. Algorithms for building classification decision trees have a ...
متن کاملComparing Connectionist and Symbolic Learning Methods
Experimental comparison of back-propagation and decision tree methods have provided many data points but less understanding of why one method works better for some tasks than for others. This paper observes that, just as there are sequential and parallel classiication methods , there are certain classiication tasks that lend themselves to methods of one or the other type.
متن کاملEarly Prediction of Gestational Diabetes Using Decision Tree and Artificial Neural Network Algorithms
Introduction: Gestational diabetes is associated with many short-term and long-term complications in mothers and newborns; hence, the detection of its risk factors can contribute to the timely diagnosis and prevention of relevant complications. The present study aimed to design and compare Gestational diabetes mellitus (GDM) prediction models using artificial intelligence algorithms. Materials ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998